AITopics | normal speech

Collaborating Authors

normal speech

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quartered Chirp Spectral Envelope for Whispered vs Normal Speech Classification

Joysingh, S. Johanan, Vijayalakshmi, P., Nagarajan, T.

arXiv.org Artificial IntelligenceAug-27-2024

Whispered speech as an acceptable form of human-computer interaction is gaining traction. Systems that address multiple modes of speech require a robust front-end speech classifier. Performance of whispered vs normal speech classification drops in the presence of additive white Gaussian noise, since normal speech takes on some of the characteristics of whispered speech. In this work, we propose a new feature named the quartered chirp spectral envelope, a combination of the chirp spectrum and the quartered spectral envelope, to classify whispered and normal speech. The chirp spectrum can be fine-tuned to obtain customized features for a given task, and the quartered spectral envelope has been proven to work especially well for the current task. The feature is trained on a one dimensional convolutional neural network, that captures the trends in the spectral envelope. The proposed system performs better than the state of the art, in the presence of white noise.

classification, spectral envelope, speech, (12 more...)

arXiv.org Artificial Intelligence

2408.14777

Country:

Asia > India > Tamil Nadu > Chennai (0.05)
North America > United States > South Carolina > Charleston County > Charleston (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

MaskCycleGAN-based Whisper to Normal Speech Conversion

Gupta, K. Rohith, Ramnath, K., Joysingh, S. Johanan, Vijayalakshmi, P., Nagarajan, T.

arXiv.org Artificial IntelligenceAug-27-2024

Whisper to normal speech conversion is an active area of research. Various architectures based on generative adversarial networks have been proposed in the recent past. Especially, recent study shows that MaskCycleGAN, which is a mask guided, and cyclic consistency keeping, generative adversarial network, performs really well for voice conversion from spectrogram representations. In the current work we present a MaskCycleGAN approach for the conversion of whispered speech to normal speech. We find that tuning the mask parameters, and pre-processing the signal with a voice activity detector provides superior performance when compared to the existing approach. The wTIMIT dataset is used for evaluation. Objective metrics such as PESQ and G-Loss are used to evaluate the converted speech, along with subjective evaluation using mean opinion score. The results show that the proposed approach offers considerable benefits.

conversion, normal speech, speech, (14 more...)

arXiv.org Artificial Intelligence

2408.14797

Country:

Asia > India > Tamil Nadu > Chennai (0.05)
North America > United States > South Carolina > Charleston County > Charleston (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.58)

Add feedback

Quartered Spectral Envelope and 1D-CNN-based Classification of Normally Phonated and Whispered Speech

Joysingh, S. Johanan, Vijayalakshmi, P., Nagarajan, T.

arXiv.org Artificial IntelligenceAug-25-2024

Whisper, as a form of speech, is not sufficiently addressed by mainstream speech applications. This is due to the fact that systems built for normal speech do not work as expected for whispered speech. A first step to building a speech application that is inclusive of whispered speech, is the successful classification of whispered speech and normal speech. Such a front-end classification system is expected to have high accuracy and low computational overhead, which is the scope of this paper. One of the characteristics of whispered speech is the absence of the fundamental frequency (or pitch), and hence the pitch harmonics as well. The presence of the pitch and pitch harmonics in normal speech, and its absence in whispered speech, is evident in the spectral envelope of the Fourier transform. We observe that this characteristic is predominant in the first quarter of the spectrum, and exploit the same as a feature. We propose the use of one dimensional convolutional neural networks (1D-CNN) to capture these features from the quartered spectral envelope (QSE). The system yields an accuracy of 99.31% when trained and tested on the wTIMIT dataset, and 100% on the CHAINS dataset. The proposed feature is compared with Mel frequency cepstral coefficients (MFCC), a staple in the speech domain. The proposed classification system is also compared with the state-of-the-art system based on log-filterbank energy (LFBE) features trained on long short-term memory (LSTM) network. The proposed system based on 1D-CNN performs better than, or as good as, the state-of-the-art across multiple experiments. It also converges sooner, with lesser computational overhead. Finally, the proposed system is evaluated under the presence of white noise at various signal-to-noise ratios and found to be robust.

1d-cnn-based classification, pitch harmonic, speech, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s00034-022-02263-5

2408.13746

Country:

Asia > India > Tamil Nadu > Chennai (0.04)
North America > United States > South Carolina > Charleston County > Charleston (0.04)
North America > United States > Illinois (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A comparative study of Grid and Natural sentences effects on Normal-to-Lombard conversion

Chen, Hongyang, Yang, Yuhong, Liu, Qingmu, Li, Baifeng, Tu, Weiping, Lin, Song

arXiv.org Artificial IntelligenceSep-19-2023

Grid sentence is commonly used for studying the Lombard effect and Normal-to-Lombard conversion. However, it's unclear if Normal-to-Lombard models trained on grid sentences are sufficient for improving natural speech intelligibility in real-world applications. This paper presents the recording of a parallel Lombard corpus (called Lombard Chinese TIMIT, LCT) extracting natural sentences from Chinese TIMIT. Then We compare natural and grid sentences in terms of Lombard effect and Normal-to-Lombard conversion using LCT and Enhanced MAndarin Lombard Grid corpus (EMALG). Through a parametric analysis of the Lombard effect, We find that as the noise level increases, both natural sentences and grid sentences exhibit similar changes in parameters, but in terms of the increase of the alpha ratio, grid sentences show a greater increase. Following a subjective intelligibility assessment across genders and Signal-to-Noise Ratios, the StarGAN model trained on EMALG consistently outperforms the model trained on LCT in terms of improving intelligibility. This superior performance may be attributed to EMALG's larger alpha ratio increase from normal to Lombard speech.

grid sentence, natural sentence, speech, (15 more...)

arXiv.org Artificial Intelligence

2309.10485

Country:

Asia > China > Hubei Province > Wuhan (0.06)
North America > Canada > Quebec > Montreal (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Investigation of Data Augmentation Techniques for Disordered Speech Recognition

Geng, Mengzhe, Xie, Xurong, Liu, Shansong, Yu, Jianwei, Hu, Shoukang, Liu, Xunying, Meng, Helen

arXiv.org Artificial IntelligenceJan-14-2022

Disordered speech recognition is a highly challenging task. The underlying neuro-motor conditions of people with speech disorders, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of speech required for system development. This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation. Both normal and disordered speech were exploited in the augmentation process. Variability among impaired speakers in both the original and augmented data was modeled using learning hidden unit contributions (LHUC) based speaker adaptive training. The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute (9.3% relative) word error rate (WER) reduction over the baseline system without data augmentation, and gave an overall WER of 26.37% on the test set containing 16 dysarthric speakers.

disordered speech, perturbation, speech, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2020-1161

2201.05562

Country:

Asia > China > Hong Kong (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Severely paralyzed man communicates using brain signals sent to his vocal tract

EngadgetJul-15-2021, 11:05:24 GMT

A severely paralyzed man has been able to communicate using a new type of technology that translates signals from his brain to his vocal tract directly into words that appear on a screen. Developed by researchers at UC San Francisco, the technique is a more natural way for people with speech loss to communicate than other methods we've seen to date. So far, neuroprosthetic technology has only allowed paralyzed users to type out just one letter at a time, a process that can be slow and laborious. It also tapped parts of the brain that control the arm or hand, a system that's not necessarily intuitive for the subject. The USCF system, however, uses an implant that's placed directly on the part of the brain dedicated to speech.

brain signal, speech, vocal tract, (7 more...)

Engadget

Country: North America > United States > California > San Francisco County > San Francisco (0.26)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.53)
Health & Medicine > Health Care Technology (0.42)

Technology: Information Technology > Artificial Intelligence (0.77)

Add feedback

Whisper to Alexa, and She'll Whisper Back : Alexa Blogs

#artificialintelligenceSep-26-2018, 13:07:09 GMT

If you're in a room where a child has just fallen asleep, and someone else walks in, you might start speaking in a whisper, to indicate that you're trying to keep the room quiet. The other person will probably start whispering, too. We would like Alexa to react to conversational cues in just such a natural, intuitive way, and toward that end, Amazon last week announced Alexa's new whisper mode, which will let Alexa-enabled devices respond to whispered speech by whispering back. At the IEEE Workshop on Spoken Language Technology, in December, my colleagues and I will present a paper that describes the techniques we used to enable whisper mode. The ultimate implementation differs somewhat, but the basic principles are the same.

artificial intelligence, machine learning, speech, (16 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback